Reliable estimation of Latent Variable Models

Marcos Jimenez, Mauricio Garnier-Villarreal, & Vithor Rosa Franco

2026-02-22

Latent Variable Modeling

A latent variable model is a way of connecting things we can measure directly (called observed or manifest variables) to hidden qualities we cannot measure directly (called latent variables). These models are used in many areas like biology, computer science, and social sciences. Latent variable models can involve either categorical or continuous observed and hidden variables, as below:

Latent variables Continuous (Manifest) Categorical (Manifest)
Continuous Factor Analysis Item Response Theory
Categorical Latent Profile Analysis Latent Class Analysis


The latent R package

  • lavaan syntax for Structural Equation Models
  • Core functions written in C++ with the armadillo library
  • Parallelization of multiple random starts to address local maxima
  • Customizable models

Latent Class Analysis Example

Model fitting:

fit <- lca(data = cancer[, 1:6], nclasses = 3L,
           item = c("gaussian", "gaussian",
                    "multinomial", "multinomial",
                    "gaussian", "gaussian"))
fit@loglik                # -5784.701
fit@penalized_loglik      # -5795.573
fit@timing                # 0.1927969

Extract model information:

# Get fit indices:
getfit(fit)

# Inspect model objects:
latInspect(fit, what = "coefs", digits = 3)
latInspect(fit, what = "classes", digits = 3)
latInspect(fit, what = "profile", digits = 3)
latInspect(fit, what = "posterior", digits = 3)

# Get confidence intervals:
CI <- ci(fit, type = "standard",
         confidence = 0.95, digits = 2)

Two-step LCA analysis with covariates

  • Step 1, fit the measurement model without the covariates:
# Measurement model:
fit0 <- lca(data = empathy[, 1:6], nclasses = 4L,
            item = rep("gaussian", ncol(empathy[, 1:6])))
  • Step 2, fit the model with covariates fixing the measurement part:
fit <- lca(data = empathy[, 1:6],
           X = empathy[, 7:8],
           model = fit0,
           item = rep("gaussian", ncol(empathy[, 1:6])),
           nclasses = 4L)
fit@loglik                # -1806.426
fit@penalized_loglik      # -1809.614
fit@Optim$iterations      # 38
fit@Optim$convergence     # TRUE
fit@timing                # 0.1596549

Factor Analysis

Factor Analysis (FA) is a method that estimates the influence of \(K\) continuous latent variables on a set of \(J\) items.

The score in item \(j\) is a weighted sum of the \(K\) latent factors: \[ X_j = \sum_{k=1}^K \lambda_{jk}F_k + \epsilon_j. \]

Under some assumptions, the \(J\) regressions can be encoded in a model for the covariance matrix of the items:

\[ S = \Lambda \Psi \Lambda^\top + \Theta. \]

  • \(\Lambda\) is a \(J \times K\) matrix containing the regression coefficients.

  • \(\Psi\) is the correlation matrix between the \(K\) latent factors.

  • \(\Theta\) is the error covariance matrix.

Positive-definite constraints

In the factor model equation, \[ \Lambda \color{red}{\Psi} \Lambda^\top + \color{red}{\Theta}, \]

Latent correlations\(\color{red}{\Psi}\) and covariances \(\color{red}{\Theta}\) should be at least positive-semidefinite but…

Positive-definite constraints (lavaan fails)

Let’s force an instance where lavaan fails to converge to a proper solution.

library(lavaan)
model <- 'visual  =~ x1 + x2 + x3
          textual =~ x4 + x5 + x6
          speed   =~ x7 + x8 + x9
          x1 ~~ x5
          x1 ~~ x4
          x4 ~~ x5
          x4 ~~ x6'
fit2 <- cfa(model, data = HolzingerSwineford1939,
            estimator = "ml", std.lv = TRUE, std.ov = TRUE)
inspect(fit2, what = "est")$theta      # Error covariances
       x1     x2     x3     x4     x5     x6     x7     x8     x9
x1  0.455                                                        
x2  0.000  0.805                                                 
x3  0.000  0.000  0.618                                          
x4  0.084  0.000  0.000  0.244                                   
x5  0.060  0.000  0.000  0.132  0.521                            
x6  0.000  0.000  0.000 -0.202  0.000 -0.087                     
x7  0.000  0.000  0.000  0.000  0.000  0.000  0.667              
x8  0.000  0.000  0.000  0.000  0.000  0.000  0.000  0.455       
x9  0.000  0.000  0.000  0.000  0.000  0.000  0.000  0.000  0.575
det(inspect(fit2, what = "est")$theta) # Check the determinant
[1] -0.00118551

Positive-definite constraints (latent converges)

model <- 'visual  =~ x1 + x2 + x3
          textual =~ x4 + x5 + x6
          speed   =~ x7 + x8 + x9
          x1 ~~ x5
          x1 ~~ x4
          x4 ~~ x5
          x4 ~~ x6'
fit <- lcfa(data = HolzingerSwineford1939, model = model,
            estimator = "ml", std.lv = TRUE, positive = TRUE,
            control = list(rstarts = 10))
round(latInspect(fit, what = "est")[[1]]$theta, 3)
      x1    x2    x3     x4    x5     x6    x7    x8    x9
x1 0.453 0.000 0.000  0.082 0.041  0.000 0.000 0.000 0.000
x2 0.000 0.805 0.000  0.000 0.000  0.000 0.000 0.000 0.000
x3 0.000 0.000 0.626  0.000 0.000  0.000 0.000 0.000 0.000
x4 0.082 0.000 0.000  0.336 0.124 -0.081 0.000 0.000 0.000
x5 0.041 0.000 0.000  0.124 0.440  0.000 0.000 0.000 0.000
x6 0.000 0.000 0.000 -0.081 0.000  0.077 0.000 0.000 0.000
x7 0.000 0.000 0.000  0.000 0.000  0.000 0.672 0.000 0.000
x8 0.000 0.000 0.000  0.000 0.000  0.000 0.000 0.462 0.000
x9 0.000 0.000 0.000  0.000 0.000  0.000 0.000 0.000 0.571
det(latInspect(fit, what = "est")[[1]]$theta)
[1] 0.0002784249

Positive-definite latent covariances

  • Parameterize latent covariances as crossproducts:

\[ \Psi = Y^\top Y \\ \Theta = U^\top U \]

  • Constraint Y and U to be orthoblique: \[ X \in \mathbb{R}^{p\times p}: \text{diag}(X^\top X) = \text{sparse matrix} \]

\[ \begin{bmatrix} \color{red}{0.08} & \color{blue}{1.76} & \color{green}{0.04} \\ \color{red}{-1.95} & \color{blue}{-0.12} & \color{green}{-0.69} \\ \color{red}{-0.67} & \color{blue}{-0.08} & \color{green}{2.02} \end{bmatrix}^\top \begin{bmatrix} \color{red}{0.08} & \color{blue}{1.76} & \color{green}{0.04} \\ \color{red}{-1.95} & \color{blue}{-0.12} & \color{green}{-0.69} \\ \color{red}{-0.67} & \color{blue}{-0.08} & \color{green}{2.02} \end{bmatrix} = \begin{bmatrix} 4.24 & 0.42 & 0.00 \\ 0.42 & 3.11 & 0.00 \\ 0.00 & 0.00 & 4.56 \end{bmatrix} \]

Positive-definite polychoric correlations

  • Parameterize the polychoric correlation matrix as a crossproduct: \[ \Sigma = X^\top X \]
  • Constraint X to be oblique: \[ X \in \mathbb{R}^{p\times p}: \text{diag}(X^\top X) = I \]

\[ \begin{bmatrix} \color{red}{0.04} & \color{blue}{1.00} & \color{green}{-0.40} \\ \color{red}{-0.95} & \color{blue}{-0.07} & \color{green}{-0.40} \\ \color{red}{-0.32} & \color{blue}{-0.04} & \color{green}{0.83} \end{bmatrix}^\top \begin{bmatrix} \color{red}{0.04} & \color{blue}{1.00} & \color{green}{-0.40} \\ \color{red}{-0.95} & \color{blue}{-0.07} & \color{green}{-0.40} \\ \color{red}{-0.32} & \color{blue}{-0.04} & \color{green}{0.83} \end{bmatrix} = \begin{bmatrix} 1.00 & 0.11 & 0.10 \\ 0.11 & 1.00 & -0.41 \\ 0.10 & -0.41 & 1.00 \end{bmatrix} \]

Cooking new stuff

  • (Exploratory) Structural Equation Modeling

  • Hidden Markov Models

  • (Multidimensional) Item Response Theory

Release date? Soon

Download the beta version at https://github.com/Marcosjnez/latent

Contact: m.j.jimenezhenriquez@vu.nl